In [ ]:
%%HTML
<style>
.container { width:100% }
</style>

Logistic Regression Using TensorFlow

This notebook shows to do handwritten character recognition with logistic regression. I have adapted this example from an example of Aymeric Damien. He has a lot of nice notebooks discussing TensorFlow at https://github.com/aymericdamien/TensorFlow-Examples/.


In [ ]:
import gzip
import pickle
import random
import numpy             as np
import matplotlib.pyplot as plt

The function $\texttt{vectorized_result}(d)$ converts the digit $d \in \{0,\cdots,9\}$ and returns a NumPy vector $\mathbf{x}$ of shape $(10, 1)$ such that $$ \mathbf{x}[i] = \left\{ \begin{array}{ll} 1 & \mbox{if $i = d$;} \\ 0 & \mbox{otherwise.} \end{array} \right. $$ This function is used to convert a digit $d$ into the expected output of a neural network that has an output unit for every digit.


In [ ]:
def vectorized_result(d):
    e    = np.zeros((10, ), dtype=np.float32)
    e[d] = 1.0
    return e

The data that we are using is stored as a gzipped, pickled file.

The function $\texttt{load_data}()$ returns a pair of the form $$ (\texttt{training_data}, \texttt{test_data}) $$ where

  • $\texttt{training_data}$ is a list containing $50,000$ pairs $(\textbf{x}, \textbf{y})$ s.t. $\textbf{x}$ is a 784-dimensional numpy.ndarray containing the input image and $\textbf{y}$ is a 10-dimensional numpy.ndarray
    corresponding to the correct digit for x.
  • $\texttt{test_data}$ is a list containing $10,000$ pairs $(\textbf{x}, \textbf{y})$. In each case, $\textbf{x}$ is a 784-dimensional numpy.ndarry containing the input image and $\textbf{y}$ is a 10-dimensional numpy.ndarray corresponding to the correct digit for $\textbf{x}$.

We do not use the validation data that are provided in the file mnist.pkl.gz.


In [ ]:
def load_data():
    with gzip.open('mnist.pkl.gz', 'rb') as f:
        train, validate, test = pickle.load(f, encoding="latin1")
    X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
    X_test  = np.array([np.reshape(x, (784, )) for x in test [0]])
    Y_train = np.array([vectorized_result(y) for y in train[1]])
    Y_test  = np.array([vectorized_result(y) for y in test [1]])
    return (X_train, X_test, Y_train, Y_test)

In [ ]:
X_train, X_test, Y_train, Y_test = load_data()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

The function $\texttt{show_digit}(\texttt{row}, \texttt{columns}, \texttt{offset})$ shows $\texttt{row} \cdot \texttt{columns}$ images of the training data. The first image shown is the image at index $\texttt{offset}$.


In [ ]:
def show_digits(rows, columns, offset=0):
    f, axarr = plt.subplots(rows, columns)
    for r in range(rows):
        for c in range(columns):
            i     = r * columns + c + offset
            image = 1 - X_train[i,:]
            image = np.reshape(image, (28, 28))
            axarr[r, c].imshow(image, cmap="gray")
            axarr[r, c].axis('off')
    plt.savefig("digits.pdf")    
    plt.show()

In [ ]:
show_digits(5, 12)

In [ ]:
import tensorflow as tf

In order to avoid a bug we have to set the following environment variable.


In [ ]:
%env KMP_DUPLICATE_LIB_OK=TRUE

We create placeholders to use for the data. Below, None stands for the yet unknown number of training examples.


In [ ]:
X = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
Y = tf.placeholder(tf.float32, [None,  10]) # 0-9 digits recognition => 10 classes

Next, we create variables for the weights and biases. The variable W is the weight matrix, while b is the bias vector.


In [ ]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

We construct the model for logistic regression. Y_pred is the prediction vector. We use the softmax activation function. For a $d$-dimensional vector $\mathbf{z}$, this function is defined as $$ \sigma(\mathbf{z})_i := \frac{\exp(z_i)}{\;\displaystyle\sum\limits_{j=1}^d \exp(z_j)\;} $$ This function is predifined in TensorFlow. Here, the vector $\mathbf{z}$ is defined as $$ \mathbf{z} = \mathbf{x} \cdot W + \mathbf{b} $$


In [ ]:
Y_pred = tf.nn.softmax(tf.matmul(X, W) + b)

We use the cross entropy as a cost function. This is defined as $$ -\sum\limits_{i=1}^d \mathtt{Y}_i \cdot \ln(\mathtt{Y\_pred}_i) $$ Here, $\mathtt{Y}_i$ is the expected outcome, while $\mathtt{Y\_pred}_i$ is the output predicted by our model.


In [ ]:
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(Y_pred), reduction_indices=1))

We set some hyperparameters. We will use stochastic gradient descent with a minibatch size of $100$.


In [ ]:
learning_rate   = 0.05
training_epochs = 50
batch_size      = 100
num_examples    = X_train.shape[0]

We use stochastic gradient descent to minimize this cost function.


In [ ]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

The function $\texttt{next_batch}(s)$ returns the next batch of the size $s$. It returns a pair of the form $(X, Y)$ where $X$ is a matrix of shape $(s, 784)$ and $Y$ is a matrix of shape $(s, 10)$. The function updates the global variable count.


In [ ]:
count = 0

In [ ]:
def next_batch(size):
    global count
    X_batch  = X_train[count:count+size,:]
    Y_batch  = Y_train[count:count+size,:]
    count   += size
    return X_batch, Y_batch

In [ ]:
%%time
init = tf.global_variables_initializer()
with tf.Session() as tfs:
    tfs.run(init)
    for epoch in range(training_epochs): 
        count = 0
        avg_cost = 0.0
        num_batches = int(num_examples / batch_size)
        # Loop over all batches
        for i in range(num_batches):
            X_batch, Y_batch = next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = tfs.run([optimizer, cost], {X: X_batch, Y: Y_batch})
            # Compute average loss
            avg_cost += c / num_batches
        print("Epoch:", '%2d,' % epoch, "cost =", "{:.9f}".format(avg_cost))
    print("Optimization Finished!")
    # Test model
    correct = tfs.run(tf.equal(tf.argmax(Y_pred, 1), tf.argmax(Y, 1)), {X: X_test, Y: Y_test})

print("Accuracy:", np.sum(correct) / len(correct))

In [ ]: